Traffic Analysis in MapReduce

نویسنده

  • Anjana Sharma
چکیده

-MapReduce is a programming model, which can process the large set of data and produces the output. The MapReduce contains two functions to complete the work, those are Map function and Reduce function. The Map function will get assign fragmented data as input and then its emit intermediate data with key and send to this intermediate data with key to the Reducer, where Reducer will get the input from the Map function and then it compute the final output. In-between MapReduce the shuffle is there, which can be sort and shuffle the mapper inter mediate data and then it sends to the Reducer. In the Shuffle phase the large amount of traffic would occur and this can leads to even more traffic in the Reducer phase, because of this time and cost also increased, if one or two Mapper and Reducer is processing then it is ok, what if, the thousands of Map and Reducer are processing the job, no one has considered the traffic between the MapReduce, everyone is bothered only about load optimization and load balancing in Big Data, so in order to avoid the traffic in between the MapReduce in this we placed the Aggregator and Check function, in which checks the data come from the shuffle phase and then it removes the user irrelevant data and then reduces the data volume to process further and then sent to the Reducers, by placing the aggregator can reduces the traffic, cost and time and also it reduces the number of Reduce function to process the remaining task. This can be achieved by using the MapReduce distributed algorithm, and finally will show the performance analysis of the traffic reduction in the MapReduce.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating Performance of Distributed Systems With MapReduce and Network Traffic Analysis

Testing, monitoring and evaluation of distributed systems at runtime is a difficult effort, due the dynamicity of the environment, the large amount of data exchanged between the nodes and the difficulty of reproduce an error for debugging. Application traffic analysis is a method to evaluate distributed systems, but the ability to analyze large amount of data is a challenge. This paper proposes...

متن کامل

Live Website Traffic Analysis Integrated with Improved Performance for Small Files using Hadoop

Hadoop, an open source java framework deals with big data. It has HDFS (Hadoop distributed file system) and MapReduce. HDFS is designed to handle large amount files through clusters and suffers performance penalty while dealing with large number of small files. These large numbers of small files pose a heavy burden on the NameNode of HDFS and an increase execution time for MapReduce. Secondly, ...

متن کامل

A Small-time Scale Netflow-based Anomaly Traffic Detecting Method Using MapReduce

Anomaly traffic detecting using Netflow data is one of important problems in the field of network security. In this paper, we proposed an approach using MapReduce model, which was realized by means of the entropy observation and DFN (Distinct feature number) distribution deviations of traffic features under anomalies at small time scales. The MapReduce was used to deal with huge amounts of data...

متن کامل

Phurti: Application and Network-aware Flow Scheduling for Mapreduce

Traffic for a typical MapReduce job in a datacenter consists of multiple network flows. Traditionally, network resources have been allocated to optimize network-level metrics such as flow completion time or throughput. Some recent schemes propose using application-aware scheduling which can reduce the average job completion time. However, most of them treat the core network as a black box with ...

متن کامل

Communication-Aware Traffic Stream Optimization for Virtual Machine Placement in Cloud Datacenters with VL2 Topology

By pervasiveness of cloud computing, a colossal amount of applications from gigantic organizations increasingly tend to rely on cloud services. These demands caused a great number of applications in form of couple of virtual machines (VMs) requests to be executed on data centers’ servers. Some of applications are as big as not possible to be processed upon a single VM. Also, there exists severa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016